84 research outputs found

    A Graph-Theoretic Barcode Ordering Model for Linked-Reads

    Get PDF
    Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals

    Debugging long-read genome and metagenome assemblies using string graph analysis

    Get PDF
    National audienceThird-generation long-read sequencing technologies tackle the repeat problem in genome assembly by producing reads that are long enough to span most repeat instances. In principle one expects that with such reads most bacterial genomes will be assembled into a single contig. However in practice, some datasets fail to be perfectly assembled even with leading assemblers, and are fragmented into a handful of contigs. As a mean to investigate those cases, we consider the string graphs that are generated by assemblers during intermediate stages of the assembly process. We seek to establish a coherent framework for analyzing these graphs, in the hope that they will help us determine the biological causes that led the assembler to output shorter contigs. This poster presents some preliminary results of such an analysis

    Impact of early complications on outcomes in patients with implantable cardioverter-defibrillator for primary prevention

    Get PDF
    International audienceBackground - The lifesaving benefit of implantable cardioverter-defibrillators (ICDs) has been demonstrated. Their use has increased considerably in the past decade, but related complications have become a major concern. Objective - The purpose of this study was to assess the incidence and effect on outcomes of early (≤30 days) complications after ICD implantation for primary prevention in a large French population. Methods - We analyzed data from 5539 patients from the multicenter French DAI-PP (Défibrillateur Automatique Implantable-Prévention Primaire) registry (2002-2012) who had coronary artery disease or dilated cardiomyopathy and were implanted with an ICD for primary prevention. Results - Overall, early complications occurred in 707 patients (13.5%), mainly related to lead dislodgment or hematoma (57%). Independent factors associated with occurrence of early complications were severe renal impairment (odds ratio [OR] 1.66, 95% confidence interval [CI] 1.17-2.37, P = .02), age ≥75 years (OR 1.01, 95% CI 1.00-1.02, P = .03), cardiac resynchronization therapy (OR 1.58, 95% CI 1.16-2.17, P = .01), and anticoagulant therapy (OR 1.28, 95% CI 1.02-1.61, P = .03). During a mean ± SD follow-up of 3.1 ± 2.3 years, 824 (15.8%) patients experienced ≥1 late complication (>30 days), and 782 (14.9%) patients died. After adjustment, early complications remained associated with occurrence of late complications (OR 2.15, 95% CI 1.73-2.66, P < .0001) and mortality (OR 1.70, 95% CI 1.34-2.17, P = .003). Conclusion - Early complications are common after ICD implantation for primary prevention, occurring in 1 in 7 patients, and are associated with an increased risk of late complications and overall mortality. Further studies are needed to investigate the underlying mechanisms of such associations

    Heart Rate and Risk of Cancer Death in Healthy Men

    Get PDF
    BACKGROUND: Data from several previous studies examining heart-rate and cardiovascular risk have hinted at a possible relationship between heart-rate and non-cardiac mortality. We thus systematically examined the predictive value of heart-rate variables on the subsequent risk of death from cancer. METHODS: In the Paris Prospective Study I, 6101 asymptomatic French working men aged 42 to 53 years, free of clinically detectable cardiovascular disease and cancer, underwent a standardized graded exercise test between 1967 and 1972. Resting heart-rate, heart-rate increase during exercise, and decrease during recovery were measured. Change in resting heart-rate over 5 years was also available in 5139 men. Mortality including 758 cancer deaths was assessed over the 25 years of follow-up. FINDINGS: There were strong, graded and significant relationships between all heart-rate parameters and subsequent cancer deaths. After adjustment for age and tobacco consumption and, compared with the lowest quartile, those with the highest quartile for resting heart-rate had a relative risk of 2.4 for cancer deaths (95% confidence interval: 1.9-2.9, p<0.0001) This was similar after adjustment for traditional cardiovascular risk factors and was observed for the commonest malignancies (respiratory and gastrointestinal). Similarly, significant relationships with cancer death were observed between poor heart rate increase during exercise, poor decrease during recovery and greater heart-rate increase over time (p<0.0001 for all). INTERPRETATION: Resting and exercise heart rate had consistent, graded and highly significant associations with subsequent cancer mortality in men

    Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

    Get PDF
    Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms

    Group A Streptococcus, Acute Rheumatic Fever and Rheumatic Heart Disease: Epidemiology and Clinical Considerations

    Get PDF

    Nouveaux composants à la périphérie des outils d'assemblages des génomes long read.

    No full text
    The sequencing of genetic information provides better understanding for a large number of biological phenomena: e.g. genetic diseases, speciation events, fundamental mechanisms of cell function. Sequencing techniques have considerably evolved since the Sanger method (1977). Nowadays third-generation sequencing technologies greatly reduce the costs of sequencing complete genomes. They produce longer reads (sequence fragments), but require the design of specific assembly tools that take into account the high error rates in the produced fragments.The study of methods used by third-generation read assembly pipelines has revealed that improvements in assembly were possible without modifying assembly tools themselves. Some improvements are thus proposed in this thesis work, and were implemented through publicly available tools. yacrd and fpa pre-process the set of reads prior to assembly, in order to improve efficiency and quality of the assembly process. knot combines information from both the input reads and an assembly, in order to provide insights on how to improve the contiguity of an assembly.Le séquençage de l'information génétique a permis de mieux comprendre un grande nombre de phénomènes biologiques, maladies génétiques, évènements de spéciations, mécanismes fondamentaux du fonctionnement de nos cellules. Les techniques de séquençage ont beaucoup évolué depuis la méthode de Sanger (1977). De nos jours, les technologies de séquençage de troisième génération permettent le séquençage d'un génome complet à moindre coût, produisent des lectures (fragments de genomes) plus longs, mais nécessitent la création d'outils d'assemblage spécifiques pour tenir compte d'un taux d'erreur élevé dans les lectures produites.L'étude des méthodes utilisées par les outils d'assemblage de lectures de troisième génération a permis d'observer que des améliorations des assemblages étaient possibles sans toutefois modifier les outils eux-mêmes. Certaines améliorations sont proposées dans ce travail de thèse, et sont mises en œuvre à travers des outils proposés à la communauté. yacrd et fpa interviennent en amont de l'assemblage en lui-même pour améliorer l'ensemble des lectures données en entrée à un assembleur. knot analyse et combine le résultat d'un assemblage avec les données brutes, pour donner des pistes permettant d'améliorer l'assemblage final

    Graph analysis of fragmented long-read bacterial genome assemblies

    No full text
    International audienceMotivationLong-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly, however they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.ResultsWe propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-3 predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies
    • …
    corecore